Workflow

StartR Workshop

Maik Bieleke, PhD

University of Konstanz

November 24, 2024

Organization

Workspace

It is highly recommended to deactivate automatic saving and restoring of the workspace because it can lead to unexpected results.

You can deactivate it in the Global Options under General - Workspace.

Working directory

The working directory is the directory from which R will read and write files. You can check the current working directory with getwd().

getwd()
[1] "C:/Users/maikb/Google Drive/Verwaltung/homepage/workshops/slides/startr-2023-frankfurt"

Absolute and relative paths

  • Absolute paths start from the root directory (on windows: C:\) and specify the complete location of a file or directory.

    Example: C:/Users/MMuster/Docs/intro2r/figures/pic.png

    • hard to read and write
    • break when moving files or directories
    • not portable across operating systems or users
  • Relative paths start from the current working directory and specify the location of a file or directory relative to the current working directory.

    Example: figures/workflow.png when the current working directory is the intro2r directory.

Projects

Projects helps you to organize your work. Each project has its own working directory, workspace, and files. In general, it is recommended to have one project for each analysis you are working on.

Projects can be created in the File menu either in a new directory or in an existing directory. This will produce a .Rproj file that you can double-click to open the project.

Directory structure

It is good practice to structure your projects’ working directory in a consistent way.

An example from Douglas et al. (2023) is shown on the right. You can adjust it to your needs.

Data import and export

The rio package (= R Import and Export) is a Swiss-Army knife for data import and export from and to various file formats.

# install.packages("rio")
library(rio)

It has two main functions:

  • import(): Import data

    # Import and assign the fifa dataset
    fifa <- rio::import("data/fifa.csv")
  • export(): Export data

    # Export the fifa dataset
    rio::export(fifa, "data/fifa.xlsx")

Note that rio automatically detects the file format based on the file extension.

Image: Pixabay

Exercise ✏️

  1. Create two projects for analyzing the FIFA and HIIT data.

  2. Establish directory structures with three folders:

    • data
    • figures
    • scripts
  3. Copy the all datasets into the corresponding data file.

  4. Import the datasets fifa.xlsx and hiit.xlsx.

Coding

Sections

Sections can help to structure a script. They start with a comment (#) followed by the section’s name and at least four dashes (-). You can also insert them via the “Code - Insert Section” menu (Ctrl/CMD + Shift + R).

  • Code folding: Section content can be collapsed and expanded.
  • Code outline: Sections are added to the script outline.
  • Code navigation: Sections show up in the code navigation. drop-down.

# Load Data ---------------------------------------------------------------

# Plot Data ---------------------------------------------------------------

Code autocompletion

RStudio offers code completion with fuzzy narrowing. It can be triggered by pressing the tab key.

  • packages
  • functions and arguments
  • files and paths

Snippets are templates for common code patterns. They can be triggered by entering a prefix and pressing the tab key twice.

  • ts + tab + tab: insert current date and time
  • lib + tab + tab: insert library() call

To create custom snippets: Tools - Global Options - Code - Edit Snippets.

Diagnostics

RStudio can help you to find problems in your code.

  • Syntax errors are marked with a red cross at the left margin and a red squiggly line. Hover the mouse over them to see the message.

  • Warning messages are marked with a yellow exclamation mark at the left margin and a yellow squiggly line. Hover the mouse over them to see the message.

Note that you must save the file to see the diagnostics.

Style guide

Review the tidyverse style guide for tips on

  • proper file names
  • good code syntax
  • working with functions

Documentation

Information about the session

The sessionInfo() function shows information about the current R session. This includes loaded packages and their versions, the operating system, and the R version.

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=en_US.UTF-8  LC_CTYPE=en_US.UTF-8    LC_MONETARY=en_US.UTF-8
[4] LC_NUMERIC=C            LC_TIME=en_US.UTF-8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rio_0.5.30

loaded via a namespace (and not attached):
 [1] zip_2.3.0         vctrs_0.6.3       cli_3.6.1         knitr_1.46       
 [5] rlang_1.1.1       xfun_0.43         stringi_1.7.12    forcats_1.0.0    
 [9] haven_2.5.3       jsonlite_1.8.8    data.table_1.14.8 glue_1.6.2       
[13] htmltools_0.5.6   readxl_1.4.3      hms_1.1.3         fansi_1.0.4      
[17] rmarkdown_2.24    cellranger_1.1.0  evaluate_0.21     tibble_3.2.1     
[21] fastmap_1.1.1     openxlsx_4.2.5.2  yaml_2.3.7        lifecycle_1.0.3  
[25] compiler_4.3.1    Rcpp_1.0.11       pkgconfig_2.0.3   rstudioapi_0.15.0
[29] digest_0.6.33     foreign_0.8-84    utf8_1.2.3        pillar_1.9.0     
[33] curl_5.0.2        magrittr_2.0.3    tools_4.3.1      

Citing R

The citation() function shows the citation for R.

citation()
To cite R in publications use:

  R Core Team (2023). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2023},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

Citing packages

The citation() function can also be used to show the citation for a package.

citation("ggplot2")
To cite ggplot2 in publications, please use

  H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
  Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

  @Book{,
    author = {Hadley Wickham},
    title = {ggplot2: Elegant Graphics for Data Analysis},
    publisher = {Springer-Verlag New York},
    year = {2016},
    isbn = {978-3-319-24277-4},
    url = {https://ggplot2.tidyverse.org},
  }

Advanced documentation

There are serveral advanced ways to document your code:

  • markdown files
  • interactive notebooks
  • version control